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ABSTRACT 

In the last years there has been a considerable increase 
in the availability of continuous sensor measurements in a 
wide range of application domains, such as Location-Based 
Services (LBS), medical monitoring systems, manufacturing 
plants and engineering facilities to ensure efficiency, prod- 
uct quality and safety, hydrologic and geologic observing 
systems, pollution management, and others. 

Due to the inherent imprecision of sensor observations, 
many investigations have recently turned into querying, min- 
ing and storing uncertain data. Uncertainty can also be 
due to data aggregation, privacy-preserving transforms, and 
error-prone mining algorithms. 

In this study, we survey the techniques that have been 
proposed specifically for modeling and processing uncertain 
time series, an important model for temporal data. We pro- 
vide an analytical evaluation of the alternatives that have 
been proposed in the literature, highlighting the advantages 
and disadvantages of each approach, and further compare 
these alternatives with two additional techniques that were 
carefully studied before. We conduct an extensive experi- 
mental evaluation with 17 real datasets, and discuss some 
surprising results, which suggest that a fruitful research di- 
rection is to take into account the temporal correlations in 
the time series. Based on our evaluations, we also provide 
guidelines useful for the practitioners in the field. 

1. INTRODUCTION 

In the last decade there has been a dramatic explosion 
in the availability of measurements in a wide range of ap- 
plication domains, including traffic flow management, me- 
teorology, astronomy, remote sensing, and object tracking. 
Applications in the above domains usually organize these 
sequential measurements into time series, i.e., sequences of 
data points ordered along the temporal dimension, making 
time series a data type of particular importance. 

Several studies have recently focused on the problems of 
processing and mining time series with incomplete, impre- 
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cise and even misleading measurements [7, 14, 25, 26, 28]. 
Uncertainty in time series may occur for different reasons, 
such as the inherent imprecision of sensor observations, or 
privacy-preserving transformations. The following two ex- 
amples illustrate these two cases: 

• Personal information contributed by individuals and 
corporations is steadily increasing, and there is a par- 
allel growing interest in applications that can be devel- 
oped by mining these datasets, such as location-based 
services and social network applications. In these ap- 
plications privacy is a major concern, addressed by var- 
ious privacy-preserving transforms [2, 11, 20], which 
introduce data uncertainty. The data can still be mined 
and queried, but it requires a re-design of the existing 
methods in order to address this uncertainty. 

• In manufacturing plants and engineering facilities, sen- 
sor networks are being deployed to ensure efficiency, 
product quality and safety [14]: unexpected vibration 
patterns in production machines, or changes in chemi- 
cal composition in industrial processes, are used to pre- 
dict failures, suggesting repairs or replacements. The 
same is true in environmental science [12], where sen- 
sor networks are used in hydrologic and geologic ob- 
serving systems, pollution management in urban set- 
tings, and application of water and fertilizers in preci- 
sion agriculture. In transportation, sensor networks 
are employed to monitor weather and traffic condi- 
tions, and increase driving safety [21]. However, sensor 
readings are inherently imprecise because of the noise 
introduced by the equipment itself [7] . This translates 
to time series with uncertain values, and addressing 
this uncertainty can provide better results in terms of 
quality and efficiency. 

While the problem of managing and processing uncertain 
data has been studied in the traditional database literature 
since the 80 's [3], the attention of researchers was only re- 
cently focused on the specific case of uncertain time series. 
Two main approaches have emerged for modeling uncertain 
time series. In the first, a probability density function (pdf) 
over the uncertain values is estimated by using some a pri- 
ori knowledge [30, 29, 23]. In the second, the uncertain data 
distribution is summarized by repeated measurements (i.e., 
samples) [5]. 

In this study, we revisit the techniques that have been 
proposed under these two approaches, with the aim of deter- 
mining their advantages and disadvantages. This is the first 
study to undertake a rigorous comparative evaluation of the 
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techniques proposed in the literature for similarity matching 
of uncertain time series. The importance of such a study is 
underlined by two facts: first, the widespread existence of 
uncertain time series; and second, the observation that sim- 
ilarity matching serves as the basis for developing various 
more complex analysis and mining algorithms. Therefore, 
acquiring a deep understanding of the techniques proposed 
in this area is essential for the further development of the 
field of uncertain time series processing, and the applications 
that are built on top of it [27, 15, 17]. 

Our evaluation reveals the effectiveness of the techniques 
that have been proposed in the literature under different sce- 
narios. In the experiments, we stress-test the different tech- 
niques both in situations for which they were designed, as 
well as in situations that fall outside their normal operation 
(e.g., unknown distributions of the uncertain values). In the 
latter case, we wish to establish how strong the assumptions 
behind the design principles of each technique are, and to 
what extent these techniques can produce reliable and sta- 
ble results, when these assumptions no longer hold. We note 
that such situations do arise in practice, where it is not al- 
ways possible to know the exact data characteristics of the 
uncertain time series. 

Furthermore, we describe additional similarity measures 
for uncertain time series, inspired by the moving average, 
namely Uncertain Moving Average (UMA), and Uncertain 
Exponential Moving Average (UEMA). Even though these 
similarity measures are very simple, previous studies had 
not considered them. However, the experimental evaluation 
shows that they perform better than the more sophisticated 
techniques that have been proposed in the literature. The 
reason lies in the fact that UMA and UEMA incorporate 
some of the information inherent in the sequence of points 
in the time series, thus, taking a step back from the in- 
dependence assumption of the other techniques. We argue 
that these measures, which are simple and computationally 
efficient, should serve as the baseline for the problem of sim- 
ilarity matching in uncertain time series. 

Moreover, we make sure that the results of our exper- 
iments are completely reproducible. Therefore, we make 
publicly available the source code for all the algorithms used 
in our experiments, as well as the datasets upon which we 
tested them 1 . 

In summary, we make the following contributions. 

• We review the state of the art techniques for similar- 
ity matching in uncertain time series, and analytically 
evaluate them. Our analysis serves as a single-stop 
comparison of the proposed techniques in terms of re- 
quirements, input data assumptions, and applicability 
to different situations. 

• We propose a methodology for comparing these tech- 
niques, based on the similarity matching task. This 
methodology provides a common ground for the fair 
comparison of all the techniques. 

• We perform an extensive experimental evaluation, us- 
ing 17 real datasets from diverse domains. In our ex- 
periments, we evaluate the techniques using a multi- 
tude of different conditions, and input data character- 
istics. Moreover, we stress-test the techniques by eval- 

1 Source code and datasets: 
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uating their performance on datasets for which they 
have not been designed to operate. 

• We describe and evaluate two additional similarity mea- 
sures for uncertain time series, that were not studied 
in this context before. These measures are based on 
moving average, and one of them also employs expo- 
nential decaying. We demonstrate that the new mea- 
sures achieve better performance than the similarity 
measures in the literature, which is an unexpected re- 
sult. 

• Finally, we provide a discussion of the results, and 
complement this discussion with thoughts on interest- 
ing research directions, as well as useful guidelines for 
the practitioners in the field. 

The rest of this paper is structured as follows. In Section 2 
we survey the principal representations and distance mea- 
sures proposed for similarity matching of uncertain time se- 
ries. In Section 3, we analytically compare the methods pro- 
posed for uncertain time series modeling, and in Section 4, 
we present the experimental comparison. We describe new 
measures of similarity matching in Section 5, and evaluate 
their performance in relation to the other measures. Finally, 
in Section 6 we summarize the results, and Section 7 con- 
cludes the study. 

2. SIMILARITY MATCHING FOR UNCER- 
TAIN TIME SERIES 

Time series are sequences of points, typically real valued 
numbers, ordered along the temporal dimension. We assume 
constant sampling rates and discrete timestamps. Formally, 
a time series S is defined as S =< si, S2, s n > where n 
is the length of S, and Si is the real valued number of S 
at timestamp i. Where not specified otherwise, we assume 
normalized time series with zero mean and unit variance. 
Notice that normalization is a preprocessing step that re- 
quires particular care to address specific situations [16]. 

In this study, we focus on uncertain time series where 
uncertainty is localized and limited to the points. Formally, 
an uncertain time series T is defined as a sequence of random 
variables < £i,£2,.-.,£n > where U is the random variable 
modeling the real valued number at timestamp i. All the 
three models we review and compare fit under this general 
definition. 

The problem of similarity matching has been extensively 
studied in the past [4, 10, 22, 13, 8, 19, 18, 16] : given 
a user-supplied query sequence, a similarity search returns 
the most similar time series according to some distance func- 
tion. More formally, given a collection of time series C — 
{Si, Sn}, where N is the number of time series, we are in- 
terested in evaluation the range query function RQ(Q, C, e): 

RQ(Q, C, e) = {S\S e C\distance(Q, S) < e} (1) 

In the above equation, e is a user-supplied distance thresh- 
old. A survey of representation and distance measures for 
time series can be found in [9]. 

A similar problem arises also in the case of uncertain time 
series, and the problem of probabilistic similarity matching 
has been introduced in the last years. Formally, given a 
collection of uncertain time series C — {Ti, ...,T/v}, we are 
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interested in evaluation the probabilistic range query func- 
tion PRQ(Q,C,e,r): 

PRQ(Q,C,e,r) = {T\T e C\Pr(distance(Q,S) < e) > r} 

(2) 

In the above equation, e and r are the user-supplied dis- 
tance threshold and the probabilistic threshold, respectively. 

In the recent years three techniques have been proposed 
to evaluate PRQ queries, namely MUNICH 2 [5], PROUD 
[29], and DUST [23]. As we discuss below, these methods 
assume that neighboring points of the time series are inde- 
pendent, i.e., the point at timestamp i is independent from 
the point at timestamp z + 1. Evidently, this is a simplifying 
assumption, since in real- world datasets neighboring points 
are correlated. We revisit this issue in the following sections. 

We now discuss each one of the above three techniques in 
more detail. 

2.1 MUNICH 

In [5], uncertainty is modeled by means of repeated ob- 
servations at each timestamp, as depicted in Figure 2. 

Assuming two uncertain time series, X and Y, MUNICH 
proceeds as follows. First, the two uncertain sequences X, Y 
are materialized to all possible certain sequences: TSx = 
{< vu,...,v n i >,...,< vi 8 ,...,Vns >} (where Vij is the j- 
th observation in timestamp i), and similarly for Y with 
TSy- Thus, we have now defined TSx ,TSy • The set of 
all possible distances between X and Y is then defined as 
follows: 

dists(X,Y) = {L p (x,y)\x e TS x ,y G TS Y } (3) 

The uncertain L p distance is formulated by means of count- 
ing the feasible distances: 



time 



Figure 2: Example of uncertain time series X — 
{xi,...,x n } modeled by means of repeated observa- 
tions. 



Efficiency can be ensured by upper and lower bounding the 
distances, and summarizing the repeated samples using min- 
imal bounding intervals [5]. This framework has been ap- 
plied to Euclidean and Dynamic Time Warping (DTW) [6] 
distances and guarantees no false dismissals in the original 
space [5]. 

2.2 PROUD 

In [29], an approach for processing queries over PROba- 
bilistic Uncertain Data streams (PROUD) is presented. In- 
spired by the Euclidean distance, the PROUD distance is 
modeled as the sum of the differences of the streaming time 
series random variables, where each random variable rep- 
resents the uncertainty of the value in the corresponding 
timestamp. This model is illustrated in Figure 1. 

Given two uncertain time series X, Y, their distance is 
defined as: 



distance(X,Y) — Dj 



(5) 



Pr(distance(X,Y) < e) = 



\{d £ dists(X,Y)\d < ejj 
\dists(X,Y)\ 



(4) 
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Figure 1: Example of uncertain time series X = 
{xi, ...,x n } modeled by means of pdf estimation. 

Once we compute this probability, we can determine the 
result set of PRQs similarity queries by filtering all uncertain 
sequences using Equation 4. 

Note that the naive computation of the result set is infea- 
sible, because of the very large space that leads to an expo- 
nential computational cost: \dists(X, Y)\ — s x s x , where 
sx,sy are the number of samples at each timestamp of 
X,Y, respectively, and n is the length of the sequences. 



2 We will refer to this method as MUNICH (it was not ex- 
plicitly named in the original paper), since all the authors 
were affiliated with the University of Munich. 



where Di — (pa — yi) are random variables, as shown in 
Figure 3. 
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Figure 3: The probabilistic distance model. 

According to the central limit theorem, we have that the 
cumulative distribution of the distances approaches a nor- 
mal distribution: 



j. + fYV . distance{X,Y)-Y, x E[(%] ^ 
distance(X, Y) norrn = — f (6) 

VE* Var[Df\ 

The normalized distance follows a standard normal dis- 
tribution, thus we can obtain the normal distribution of the 
original distance as follows: 
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distance(X,Y) oc JV($^ E[D% ^ Var[D*}) (7) 

i i 

The interesting result here is that, regardless of the data 
distribution of the random variables composing the uncer- 
tain time series, the cumulative distribution of their dis- 
tances (1) is defined similarly to their Euclidean distance 
and (2) approaches a normal distribution. Recall that we 
want to answer PRQs similarity queries. First, given a prob- 
ability threshold r and the cumulative distribution function 
(cdf) of the normal distribution, we compute en m it such 
that: 

Pr(distance(X,Y)norm < eumit) > r (8) 
The cdf of the normal distribution can be formulated in 
terms of the well-known error-function, and eumit can be 
determined by looking up the statistics tables. Once we 
have eumit, we proceed by computing also the normalized e: 



DUST(X,Y) = f^2dust(xi,yi) 



E[distance(X, Y)] 



(9) 



y/V ar[distance(X , Y)] 

Then, we have that if a candidate uncertain series Y sat 
isfies the inequality: 



(X,Y)> 

tlimit 

then the following equation holds: 



Pr(distance(X, Y) r , 



i(X,Y))>T 



(10) 



(11) 



Therefore, Y can be added to the result set. Otherwise, 
it is pruned away. This distance formulation is statistically 
sound and only requires knowledge of the general character- 
istics of the data distribution, namely, its mean and vari- 
ance. 

2.3 DUST 

In [23], the authors propose a new distance measure, DUST, 
that compared to MUNICH, does not depend on the exis- 
tence of multiple observations and is computationally more 
efficient. Similarly to [29], DUST is inspired by the Eu- 
clidean distance, but works under the assumption that all 
the time series values follow some specific distribution. 

Given two uncertain time series X, Y, the distance be- 
tween two uncertain values x% , yi is defined as the distance 
between their true (unknown) values r(xi),r(yi): dist(xi, yi) 
= L 1 (r(xi),r(yi)). This distance can then be used to define 
a function cj) that measures the similarity of two uncertain 
values: 



<l>(\xi - Vi\) = Pr(dist(0\r(x % ) - r(y t )\) = 0) 



(12) 



This basic similarity function is then used inside the dust 
dissimilarity function: 



dust(x, y) 



with 



V- log(0(|x-y|)) - k 



-logOKo)) 



The constant k has been introduced to support reflexivity. 
Once we have defined the dust distance between uncertain 
values, we are ready to extend it to the entire sequences: 



(13) 



The handling of uncertainty has been isolated inside the 
4> function, and its evaluation requires to know exactly the 
data distribution. In contrast to the techniques we reviewed 
earlier, the DUST distance is a real number that measures 
the dissimilarity between uncertain time series. Thus, it can 
be used in all mining techniques for certain time series, by 
simply substituting the existing distance function. 

Finally, we note that DUST is equivalent to the Euclidean 
distance, in the case where the error of the time series values 
follows the normal distribution. 

3. ANALYTICAL COMPARISON 

In this section, we compare the three models of similar- 
ity matching for uncertain time series, namely, MUNICH, 
PROUD and DUST, along the following dimensions: un- 
certainty models used and assumptions made by the algo- 
rithms; type of distance measures; and type of similarity 
queries. 

3.1 Uncertainty Models and Assumptions 

All three techniques we have reviewed are based on the 
assumption that the values of the time series are indepen- 
dent from one another. That is, the value at each timestamp 
is assumed to be independently drawn from a given distri- 
bution. Evidently, this is a simplifying assumption, since 
neighboring values in time series usually have a strong tem- 
poral correlation. 

The main difference between MUNICH and the other two 
techniques is that MUNICH represents the uncertainty of 
the time series values by recording multiple observations for 
each timestamp. This can be thought of as sampling from 
the distribution of the value errors. In contrast, PROUD 
and DUST consider each value of time series to be a contin- 
uous random variable following a certain probability distri- 
bution. 

The amount of preliminary information, i.e. a priori knowl- 
edge of the characteristics of the time series values and their 
errors, varies greatly among the techniques. MUNICH does 
not need to know the distribution of the time series values, 
or the distribution of the value errors. It simply operates on 
the observations available at each timestamp. 

On the other hand, PROUD and DUST need to know the 
distribution of the error at each value of the data stream. 
In particular, PROUD requires to know the standard devi- 
ation of the uncertainty error, and a single observed value 
for each timestamp. PROUD assumes that the standard de- 
viation of the uncertainty error remains constant across all 
timestamps. 

DUST uses the largest amount of information among the 
three techniques. It takes as input a single observed value 
of the time series for each timestamp, just like PROUD. In 
addition, DUST needs to know the distribution of the uncer- 
tainty error at each time stamp, as well as the distribution of 
the values of the time series. This means that, in contrast to 
PROUD, DUST can take into account mixed distributions 
for the uncertainty errors (albeit, they have to be explicitly 
provided in the input). 

Overall, we observe that the three techniques make dif- 
ferent initial assumptions about the amount of information 
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available for the uncertain time series, and have different in- 
put requirements. Consequently, when deciding which tech- 
nique to use, users should take into account the information 
available on the uncertainty of the time series to be pro- 
cessed. 

3.2 Type of Distance Measures 

All the considered techniques use some variation of the 
Euclidean distance. MUNICH and PROUD use this dis- 
tance in a pretty straightforward manner. Moreover, MU- 
NICH and DUST can be employed to compute the Dynamic 
Time Warping distance [24] , which is a more flexible distance 
measure. 

DUST is a new type of distance, specifically designed for 
uncertain time series. In other words, DUST is not a simi- 
larity matching technique per se, but rather a new distance 
measure. It has been shown that DUST is proportional to 
the Euclidean distance in the cases where the value errors 
are normally distributed [23]. Moreover, the authors of [23] 
note that if all the value errors follow the same distribution, 
then it is better to use the Euclidean distance. DUST be- 
comes useful when the value errors are modeled by multiple 
error distributions. 

3.3 Type of Similarity Queries 

MUNICH and PROUD are designed for answering prob- 
abilistic range queries (defined in Section 2). DUST being 
a distance measure, it can be used to answer top-k nearest 
neighbor queries, or perform top-k motif search. 

MUNICH and PROUD solve the similarity matching prob- 
lem that is described by Equation 8, resulting to a set of time 
series that belong to the answer with a certain probability, 
t. On the other hand, DUST produces a single value that 
is an exact (i.e., not probabilistic) distance between two un- 
certain time series. 

In Section 4, we describe the methodology we used in 
order to compare all three techniques using the same task, 
that of similarity matching. 

4. COMPARATIVE STUDY 

In this section, we present the experimental evaluation 
of the three techniques. We first describe the methodology 
and datasets used, and then discuss the results of the exper- 
iments. 

All techniques were implemented in C++, and the exper- 
iments were run on a PC with a 2.13GHz CPU and 4GB of 
RAM. 

The source code for all the algorithms used in our exper- 
iments, as well as the datasets upon which we tested them 
are publicly available 1 . 

4.1 Experimental Setup 

4.1.1 Datasets 

Similarly to [5, 29, 23], we used existing time series datasets 
with exact values as the ground truth, and subsequently in- 
troduced uncertainty through perturbation. Perturbation 
models errors in measurements, and in our experiments we 
consider uniform, normal and exponential error distributions 
with zero mean and varying standard deviation within in- 
terval [0.2,2.0]. 

We considered 17 real datasets from the UCR classifica- 
tion datasets collection [1], representing a wide range of 



application domains: 50words, Adiac, Beef, CBF, Coffee, 
ECG200, FISH, Face All, FaceFour, Gun_Point, Lighting2, 
Lighting7, OSULeaf, OliveOil, SwedishLeaf, Trace, and syn- 
thetic_control. The training and testing sets were joined to- 
gether, and we obtained on average 502 time series of length 
290 per dataset. We stress the fact that each dataset con- 
tains several time series instances. 

Since DUST requires to know the distribution of values of 
the time series, and additionally makes the assumption that 
this distribution is uniform [23], we tested the datasets to 
check if this assumption holds. According to the Chi-square 
test, the hypothesis that the datasets follow the uniform 
distribution was rejected (for all datasets) with confidence 
level a = 0.01. Evidently, the above assumption does not 
hold on all datasets, however DUST still needs it in order 
to operate. 



4.1.2 Comparison Methodology 

In our evaluation, we consider all three techniques, namely, 
MUNICH, PROUD, and DUST, and we additionally com- 
pare to Euclidean distance. When using Euclidean distance, 
we do not take into account the distributions of the values 
and their errors: we just use a single value for every times- 
tamp, and compute the traditional Euclidean distance based 
on these values. 

The goal of our evaluation is to compare the performance 
of the different techniques on the same task. Observe that 
we cannot use the top-k search task for this comparison. 
The reason is that the MUNICH and PROUD techniques 
have a notion of probability (Equation 2). This means that 
these techniques can produce different rankings when the 
threshold e changes. For example, assume that we increase 
e (maintaining r fixed). Then the ordering of the time series 
in a top-k ranking may change, since the probability that the 
time series are similar within distance s\ > e may increase. 
Thus, in the case of uncertain time series, MUNICH and 
PROUD might produce very different top-k answers even if 
e varies a little. This, in turn, means that the top-k task is 
not suitable for comparing the three techniques. 

We instead perform the comparison using the task of time 
series similarity matching. Even though DUST is not a sim- 
ilarity matching technique (like PROUD and MUNICH), it 
can still be used to find similar time series, when we specify 
a maximum threshold on the distance between time series. 
In [23] , the evaluation of DUST was based on top-k similar 
time series. However, we note that this problem includes the 
problem of similarity matching [9], where the most similar 
time series form the answer to the top-k query. 

Following the above discussion, in order to perform a 
fair comparison we need to specify distance thresholds for 
all three techniques. This translates to finding equivalent 
thresholds e for each one of the techniques. We proceed as 
follows. 

Since the distances in MUNICH and PROUD are based 
on the Euclidean distance, we will use the same threshold 
for both methods, e eU ci- Then, we calculate an equivalent 
threshold for DUST, Sdust- Given a query q and a dataset 
C, we identify the 10th nearest neighbor of q in C. Let that 
be time series c. We define s eU ci as the Euclidean distance 
on the observations between q and c and Sdust as the DUST 
distance between q and c. This procedure is repeated for 
every query q. 
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The quality of results of the different techniques is eval- 
uated by comparing the query results to the ground truth. 
We performed experiments for each dataset separately, us- 
ing each one of the time series as a query and performing a 
similarity search. In the graphs, we report the averages of 
all these results, as well as the 95% confidence intervals 3 . 

4.2 Quality Performance 

In order to evaluate the quality of the results, we used 
the two standard measures of recall and precision. Recall 
is defined as the percentage of the truly similar uncertain 
time series that are found by the algorithm. Precision is the 
percentage of similar uncertain time series identified by the 
algorithm, which are truly similar. Accuracy is measured in 
terms of F\ score to facilitate the comparison. The F± score 
is defined by combining precision and recall: 



Fx 



precision * recall 
precision + recall 



(14) 



We verify the results with the exact answer using the 
ground truth, and compare the results with the algorithm 
output (as described in Section 4.1.2). 

4.2.1 Accuracy 

The first experiment represents a special case with re- 
stricted settings. This was necessary to do, because the 
computational cost of MUNICH was prohibitive for a full 
scale experiment. We compare MUNICH, PROUD, DUST 
and Euclidean on the Gun_Point dataset, truncating it to 
60 time series of length 6. For each timestamp, we have 5 
samples as input for MUNICH. Results are averaged on 5 
random queries. For both MUNICH and PROUD we are 
using the optimal probabilistic threshold, r, determined af- 
ter repeated experiments. Distance thresholds are chosen 
(according to Section 4.1.2) such that in the ground truth 
set they return exactly 10 time series. 

The results with Gaussian error (refer to Figure 4(a)) 
show that all techniques perform well (F\ >80%) when the 
standard deviation of the errors is low (<j = 0.2), with MU- 
NICH being the best performer (Fi=88%). However, as the 
standard deviation increases to 2, the accuracy of all tech- 
niques decreases. This is expected, since a larger standard 
deviation means that the time series have more uncertainty. 
The behavior of MUNICH though, is interesting: its accu- 
racy falls sharply for a > 0.6. 

This trend was verified also with uniform and exponential 
error distributions, as reported in Figures 4(b) and 4(c). 
With exponential error, the performance of MUNICH is 
slightly better than with normal, or uniform error distribu- 
tions. However, MUNICH still performs much worse than 
PROUD and DUST for a > 0.6. 

Figure 5(a) shows the results of the same experiment, but 
just for PROUD, DUST, and Euclidean. In this case (and 
for all the following experiments), we report the average re- 
sults over the full time series for all datasets. Once again, 
the error distribution is normal, and PROUD is using the 
optimal threshold, r, for every value of the standard devia- 
tion. 



3 Please note that the results we report are not directly com- 
parable to those in the original papers. In our study, we use 
a different experimental setup, in order to make possible the 
comparison of the three techniques. 



The results show that there is virtually no difference among 
the different techniques. This observation holds across the 
entire range of standard deviations that we tried (0.2 < a < 
2). 

The results for the uniform and exponential distributions 
are very similar, and reported in Figures 5(b) and 5(c). 
With both uniform and exponential errors, PROUD per- 
forms slightly better for a = 0.2, and its performance drops 
slightly below DUST and Euclidean for larger error standard 
deviations. 

With uniform error, the accuracy of DUST drops by nearly 
10% for a — 0.2 (refer to Figures 5(b)). This apparently 
insignificant observation turned out to be due to how the 
DUST lookup tables are determined: When the error is 
uniformly distributed, <j>(\xi — yi\) may be equal to zero in 
some cases. Consequently, dust(x,y) cannot be evaluated 
for these cases, as it degenerates to the logarithm of zero. 
We tried to solve this technical problem by adding two tails 
to the uniform error, so that the error probability density 
function is never exactly zero. This workaround proved use- 
ful, but did not completely solve the problem. 

4.2.2 Precision and Recall 

In order to better understand the behavior of the differ- 
ent techniques, we take a closer look at precision and re- 
call. Figures 6(a) and 6(b) show the precision and recall for 
PROUD, as a function of the error standard deviation, when 
the distribution of the error follows a uniform, a normal, and 
an exponential distribution. PROUD is using the optimal 
threshold, r, for every value of the standard deviation. 

The graphs show that recall always remains relatively high 
(between 63% — 83%). On the contrary, precision is heavily 
affected, decreasing from 70% to a mere 16% as standard 
deviation increases from 0.2 to 2. Therefore, processing un- 
certain time series with an increasing standard deviation in 
their error does not have a significant impact on the false 
positives. However, this leads to many false negatives, which 
may be an undesirable effect. 

The corresponding results for DUST are shown in Fig- 
ures 7(a) and 7(b). We observe the same trends as before, 
the only difference being that DUST achieves slightly better 
precision, but lower recall. 

4.2.3 Mixed Error Distributions 

While in all previous experiments the error distribution 
is constant across all the values of a time series, in this ex- 
periment we evaluate the accuracy of PROUD, DUST, and 
Euclidean when we have different error distributions present 
in the same time series (Figure 8) . Each time series has been 
perturbed with normal error, but of varying standard devi- 
ation. Namely, the error for 20% of the values has standard 
deviation 1, and the rest 80% has standard deviation 0.4. 

We note that this is a case that PROUD cannot handle, 
since it does not have the ability to model different error 
distributions within the same time series (in this experi- 
ment, PROUD was using a standard deviation setting of 
0.7). Therefore, PROUD does not produce better results 
than Euclidean. On the other hand, DUST is taking into 
account these variations of the error, and achieves a slightly 
improved accuracy (3% more than PROUD and Euclidean). 

We also conducted the same experiment by changing the 
following settings: (i) inform DUST (wrongly) that the stan- 
dard deviation is 0.7, and (ii) perturb a time series with a 
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(a) normal error distribution (b) uniform error distribution (c) exponential error distribution 



Figure 4: F\ score for MUNICH, PROUD, DUST and Euclidean on Gun_Point truncated dataset, when varying the 
error standard deviation: normal error distribution (left), uniform (center), exponential (right). 




0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 

Standard deviation Standard deviation Standard deviation 



(a) normal error distribution (b) uniform error distribution (c) exponential error distribution 



Figure 5: F\ score for PROUD, DUST and Euclidean, averaged over all datasets, when varying the error standard 
deviation: normal error distribution (left), uniform (center), exponential (right). 
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Figure 8: F± score for PROUD, DUST, and Euclidean on 
all the datasets with mixed error distribution (normal), 
20% with standard deviation 1.0, and 80% with standard 
deviation 0.4. 



Figure 9: F± score for PROUD, DUST, and Euclidean on 
all the datasets with mixed error distribution (uniform, 
normal, and exponential), 20% with standard deviation 
1.0, and 80% with standard deviation 0.4. 



mixture of uniform, normal, and exponential distributions 
(this situation cannot be handled by PROUD). 

As shown in Figures 9 and 10, in both these experiments 
the accuracy of all techniques (PROUD, DUST, and Eu- 
clidean) is almost the same, and consistently lower for the 
second experiment. These results indicate that in situations 
where we do not have enough, or accurate information on 
the distribution of the error, PROUD and DUST do not 
offer an advantage when compared to Euclidean. 



4.3 Time Performance 

In Figure 11, we report the CPU time per query for the 
normal error distribution when varying the error standard 
deviation in the range [0.2, 2.0]. The results for uniform and 
exponential distributions are very similar, and omitted for 
brevity. 

The graph shows that the standard deviation of the nor- 
mal distribution only slightly affects performance for DUST. 
As expected, the execution time for Euclidean is not af- 
fected at all when the standard deviation for the error of 
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Figure 6: Precision and recall for PROUD, averaged over all datasets, when varying error standard deviation and 
error distribution. 
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Figure 7: Precision and recall for DUST, averaged over all datasets, when varying error standard deviation and error 
distribution. 
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Figure 10: F± score for PROUD, DUST, and Euclidean 
on all the datasets with mixed error distribution: nor- 
mal, with standard deviation erroneously reported as 
constant 0.7. 



the uncertain time series varies, and exhibits the best time 
performance of all techniques. 

We note that for PROUD we did not use the wavelet 
synopsis, since we did not use any summarization technique 
for the other techniques either. However, it is possible to 
apply PROUD on top of a Haar wavelet synopsis. This 
results in CPU time for PROUD that is equal or less to the 
CPU time of Euclidean, while maintaining high accuracy 
[29]. 

We did not include the time performance for MUNICH in 
this graph, because it is orders of magnitude more expensive 
than the other techniques (i.e., in the order of minutes). 

In Figure 12, we report the CPU time per query for the 
normal error distribution when varying the time series length 
between 50 and 1000 time points. Time series of different 
lengths have been obtained resampling the raw sequences. 
The graph shows that the time grows linearly to the time 
series length. The results for uniform and exponential dis- 
tributions are very similar, and omitted for brevity. 

5. MOVING AVERAGE FOR UNCERTAIN 
TIME SERIES 

The moving average is among the simplest filters for noise 
reduction in signal processing. In this section, we compare 
some basic adaptations of the moving average to the DUST 
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Figure 11: Average time per query for PROUD, 
DUST, and Euclidean, averaged over all datasets, 
when varying the error standard deviation with nor- 
mal error distribution. 
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Figure 12: Average time per query for PROUD, 
DUST, and Euclidean, averaged over all datasets, 
when varying the time series length with normal er- 
ror distribution. 



and Euclidean distances, and evaluate their effectiveness. 
We note that similar to the Euclidean and DUST distances, 
it does not provide any quality guarantees in the context of 
uncertain time series similarity matching. (In contrast, MU- 
NICH and PROUD provide an additional probabilistic mea- 
sure of certainty for the computed similarity value.) Nev- 
ertheless, the measures we describe below take a first step 
away from the assumption, which the techniques we exam- 
ined so far make, that neighboring points in a time series 
are independent. 

5.1 Neighborhood-Aware Models 

Given a series of noisy measurements S =< vi, V2, v n >, 
the moving average of these measurements S m is obtained 
by substituting each value v% with vfi% , defined as the average 

of values Vi — W) •••5 Vi-\- w \ 



Ei-\-w 
j=i- 



2w + l 



(15) 



where w is a user-defined parameter that defines the win- 
dow width 2w + 1 to be considered in the average. In the 
moving average, all samples are weighted equally. 

A variant of the moving average, namely the exponen- 
tial moving average, has been introduced to weigh more the 
nearest neighbors of the current value, through an expo- 
nential decaying factor. The exponential moving average of 
sequence S, S e , is obtained by substituting each value v% 
with a, defined as follows: 



E 



i+w 
3=i- 



-Mo-i\ 



Ei-\-w 
j=i— 



-A|j-t| 



(16) 



where A controls the exponential decaying factor. 

The above two moving average filters require no a priori 
knowledge of the data distribution, and their parameters 
are intuitive and easy to tune, thus making these techniques 
widely adopted in the real world. In the next paragraphs, 
we introduce two variants of the moving and exponential 
moving averages that exploit the a priori knowledge of the 
error standard deviation. 

Intuitively, we can weigh less the observations drawn from 
random variables that exhibit larger error standard devia- 
tion, as we have less confidence on the correctness of their 
value. The Uncertain Moving Average (UMA) is based on 
the moving average, where sequence S is substituted by S p , 
and the point prrii is defined as follows: 



pirn 



Ei-\-w 
j—i — 



2w + 1 



(17) 



where Sj is the standard deviation of random variable tj . 

The Uncertain Exponential Moving Average (UEMA) is 
based on the exponential moving average, where sequence S 
is substituted by S e , and point pd is defined as follows: 



E 



pet 



i-\-w 
j=i- 



-A|j-i| 



Ei-\-w 
j—i — 



(18) 



At this point, we have introduced the UMA and UEMA 
filters. These filters allow us to reduce the signal noise, 
but do not define any distance function. In the subsequent 
experiments, we consider the Euclidean distance computed 
on the sequences filtered by UMA and UEMA techniques. 
Thus, Euclidean, UMA, and UEMA share the same distance 
function, but the input sequence is different. 

5.2 Performance 

We first examine the behavior of UMA and UEMA when 
we vary the parameters window size, w, and decaying factor, 
A. Figure 13 depicts the effect of varying w between — 20 
on the F\ score. The results are averaged over all datasets. 
Note that when w = 0, UMA and UEMA degenerate to the 
simple Euclidean distance. We observe that the accuracy for 
UMA increases by 13% as we increase w from to 2, and 
then starts falling again as we further increase w. Evidently, 
aggregating many points (i.e., large w) is equally ineffective 
as not aggregating at all (i.e., w = 0), since distant neighbors 
do not carry much (if at all) information about the current 
point. 

The graphs also shows the performance of UEMA for two 
different A settings. For a small decaying factor, A = 0.1, 
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Figure 13: Fl score varying the window size, w, for 
UMA and UEMA (with A = 0.1, 1). 

















UEMA-5 
UEMA-10 

























































































0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
Decaying factor 

Figure 14: Fl score varying the decaying factor, A, for 
UEMA (for w = 5,10). 



UEMA performs very close to UMA, since all the points 
in the window are assigned similar weights. This effect di- 
minishes as w increases and A introduces a higher variation 
among the weights of the near and distant neighbors of the 
current point. When we use a high value for the decaying 
factor, A = 1, the effect of the distant neighbors diminishes 
much faster, thus, rendering the size of the window irrele- 
vant for the performance of UEMA. 

In Figure 14 we illustrate how the accuracy of UEMA 
varies when we change A (the case A = is equivalent to 
UMA) . The experiments show that A has only a small effect 
on the performance of the algorithm, especially when the 
size of the window is small. 

Overall, we note that UMA and UEMA exhibit a rela- 
tively stable behavior with respect to their parameters. For 
the rest of this study, we assume a decaying factor of A = 1 
for UEMA, and a moving average window length W = 5 
(i.e., w = 2) for both UMA and UEMA. 

In the next set of experiments, we compare the accuracy 
of Euclidean, DUST 4 , UMA, and UEMA techniques on all 



4 Based on the previous experiments, DUST performs at 
least as good, or better than MUNICH and PROUD for 



datasets perturbed with normal mixed error distribution, 
where 20% points with error standard deviation 1.0, and 
the remaining 80% with error standard deviation 0.4. This 
setting was chosen to stress-test the techniques. Every time 
series in each dataset was used as a query, and the results 
are averaged over all these time series. 

Figure 15 depicts the results for the above experiment. 
The accuracy of DUST and Euclidean is almost the same, 
while UMA and UEMA perform consistently better, with 
the latter achieving the best performance among all tech- 
niques. Similar results were obtained for the uniform and ex- 
ponential mixed error distributions, as shown in Figures 15 
and 17, respectively. 

The graphs show that (on average, across all datasets) Eu- 
clidean is always the worst performer, with a drop of 9% in 
its performance for the mixed exponential error distribution, 
which represents the hardest case. DUST performs close to 
Euclidean for the mixed normal and uniform distribution, 
but manages to maintain the same level of performance for 
the mixed exponential distribution as well. 

UMA and UEMA exhibit the highest accuracy levels, av- 
eraging 4% to 15%, respectively, higher than DUST, and 
maintaining the same level of performance across all error 
distributions. Overall the F\ score of UEMA is 4% higher 
than that of UMA. 

The above results are very interesting: the intuitive and 
simple UMA and UEMA techniques outperform DUST, a 
complex method that requires much more a priori knowl- 
edge on the data distributions. Instead, these experiments 
indicate that much of the knowledge is conveyed in the er- 
ror standard deviation, and in the distribution of the neigh- 
boring points. UMA and UEMA are the best performers, 
because they do not assume that data points are indepen- 
dent, a simplifying, yet unrealistic assumption made by the 
techniques previously proposed in the literature. 

Note that UMA and UEMA are also computationally effi- 
cient, requiring almost the same time as Euclidean, and sig- 
nificantly less time than DUST, PROUD, and MUNICH. All 
the above observations indicate that UEMA is the method 
of choice for similarity matching in uncertain time series, 
when a probabilistic measure of certainty for the similar- 
ity is not required. Even when such a measure is required, 
UEMA can serve as a baseline for the target performance. 

6. DISCUSSION 

In this work, we reviewed the existing techniques for sim- 
ilarity matching in uncertain time series, and performed 
analytical and experimental comparisons of the techniques. 
Based on our evaluation, we can provide some guidelines for 
the use of these techniques. 

MUNICH and PROUD are based on the Euclidean dis- 
tance, while DUST proposes a new distance measure. Nev- 
ertheless, DUST outperforms Euclidean only if the distribu- 
tion of the observation errors is mixed, and the parameters 
of this distribution are known. 

An important factor for choosing among the available 
techniques is the information that is available about the dis- 
tribution of the time series and its errors. When we do not 
have enough, or accurate information on the distribution of 
the error, PROUD and DUST do not offer an advantage 

a variety of settings. Therefore, we only report the perfor- 
mance of DUST in these experiments for ease of exposition. 
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Figure 15: Fl score for all datasets and mixed error 
distribution: uniform with 20% standard deviation 1.0, 
and 80% standard deviation 0.4. 
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Figure 17: Fl score for all datasets and mixed error 
distribution: exponential with 20% standard deviation 
1.0, and 80% with standard deviation 0.4. 
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Figure 16: Fl score for all datasets and mixed error 
distribution: normal with 20% standard deviation 1.0, 
and 80% with standard deviation 0.4. 



in terms of accuracy when compared to Euclidean. Never- 
theless, Euclidean does not provide quality guarantees while 
MUNICH and PROUD do. 

The probabilistic threshold r has a considerable impact 
on the accuracy of the MUNICH and PROUD techniques. 
However, it not obvious how to set r, and no theoretical 
analysis has been provided on that. The only way to pick 
the correct value is by experimental evaluation, which can 
sometimes become cumbersome. 

Our experiments showed that MUNICH is applicable only 
in the cases where the standard deviation of the error is 
relatively small, and the length of the time series is also small 
(otherwise the computational cost is prohibitive). However, 
we note that this may not be a restriction for some real 
applications. Indeed, MUNICH'S high accuracy may be a 
strong point when deciding the technique to use. 

The UMA and UEMA moving average filters proved to 
be very effective, outperforming the previous techniques in 
a variety of settings. This surprising result is due to the 
ability of the moving average to exploit the correlation of 
neighboring points in a very intuitive and simple manner: it 
reduces the effect of errors, which the filter levels out. Ignor- 
ing the strong correlation exhibited by neighboring points in 



the time series is not beneficial. Indeed, as our study shows, 
it is a severe limitation of all the techniques previously pro- 
posed in the literature. 

However, we should note that the goal of MUNICH and 
PROUD is to provide an additional probabilistic measure of 
certainty for the computed similarity value. This is some- 
thing that we cannot readily get from UMA, UEMA, or 
DUST, and may be important for certain applications. 

Finally, we observe that there exist some datasets for 
which all techniques perform well (e.g., FaceFour and OSU- 
Leaf), and others for which accuracy is low (e.g., Adiac 
and Swedish Leaf). A close look at the characteristics of 
these datasets revealed that datasets for which the aver- 
age distance between time series was low led to low accu- 
racy. This is because uncertainty has a significant impact for 
these datasets, making it hard to distinguish the time series 
and select a clear winner for the similarity matching prob- 
lem. On the other hand, the same level of uncertainty does 
not affect much datasets that have a high average distance 
among their time series. 



7. CONCLUSIONS 

The emerging area of uncertain time series processing 
and analysis is increasingly attracting the attention of both 
the research community and the practitioners in the field, 
mainly because of the applications and interesting problems 
it entails. 

In this study, we evaluated the state of the art techniques 
for similarity matching in uncertain time series, as this oper- 
ation is the basis for more complex algorithms. Apart from 
the techniques that were previously proposed in the litera- 
ture, we also evaluated two additional, obvious alternatives 
that were not studied before. 

Our experiments were based on 17 real, diverse datasets, 
and the results demonstrate that simple measures, based on 
moving average, outperform the more sophisticated alterna- 
tives. These results also suggest that a promising direction 
is to develop measures that take into account the sequential 
correlations inherent in time series. 
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